Biogenesis, conservation, and function of miRNA in liverworts

This review highlights recent findings on the liverwort miRNA repertoire, including Marchantia polymorphamiRNA gene organization, biogenesis, and functions, and microprocessor, auxiliary, and regulatory proteins involved in miRNA biogenesis.


Introduction
Since the discovery of microRNAs (miRNAs) in the early 1990s, these small non-coding RNAs have been found in almost all eukaryotic lineages (Lee et al., 1993;Carrington and Ambros, 2003;Moran et al., 2017;. miR-NAs are involved in various biological processes, including cell division, expansion, and differentiation, as well as in the This paper is available online free of all access charges (see https://academic.oup.com/jxb/pages/openaccess for further details) specification of organ identity, developmental phase transitions, responses to environmental stresses, and many others (Li and Zhang, 2016;Liu et al., 2018). Our knowledge of miRNA function in plants has been derived mainly from studies of model plants with sequenced genomes, most of which are angiosperms; thus, our knowledge relies on only one of the major land plant lineages. miRNAs are endogenous molecules 18-24 nucleotides in length that repress gene expression via mRNA cleavage or translational inhibition of targets. The miRNA-induced silencing complex (miRISC) is guided to a target mRNA by the complementary binding of a miRNA to its target mRNA (Bartel, 2009;Rogers and Chen, 2013;Yu et al., 2017). In higher plants, most miRNA genes (MIRs) are independent transcription units, with the rest being located within the introns of protein-coding or non-coding genes (Stepien et al., 2017). All plant MIR genes studied to date are transcribed by RNA polymerase II (RNA Pol II), which produces primary transcripts (pri-miRNAs) that are 5ʹ capped and 3ʹ polyadenylated. The characteristic feature of pri-miRNAs is that they can fold back into hairpin-like structures that house miRNA/miRNA* (miRNA strand/passenger strand) duplexes within the stem of the hairpin (Xie et al., 2005;Wang et al., 2019;. The process of maturation of the plant pri-miRNA to the final miRNA molecule takes place entirely in the cell nucleus, where it is directed by a multiprotein microprocessor in a two-step process (Fig. 1). First, the stem-loop structure of the pri-miRNA causes an initial cleavage event that releases the precursor miRNA (pre-miRNA) hairpin. The pre-miRNAs are then processed further into miRNA/miRNA* duplexes during the second step of miRNA biogenesis (Bologna et al., 2009;Cuperus et al., 2010;Mateos et al., 2010;Song et al., 2010;Werner et al., 2010;Wang et al., 2019;. Here, both endonucleolytic cleavage events are catalyzed by a RNase III type enzyme, Dicer-Like 1 (DCL1), which recognizes the stem-loop structures in both pri-miRNAs and pre-miRNAs (Park et al., 2002;Sun et al., 2018;Wei et al., 2021). For the efficient and correct processing of plant pri-miRNAs by DCL1, the assistance of microprocessor core proteins, including the double-stranded RNA-binding protein Hyponastic Leaves 1 (HYL1) and the zinc finger protein SERRATE (SE), is needed (Han et al., 2004;Lobbes et al., 2006;Yang et al., 2006;Dong et al., 2008;Achkar et al., 2018;Wang et al., 2019;Xie et al., 2021). The final products of plant microprocessor action, miRNA/ miRNA* duplexes, are further methylated at the 3ʹ ends by Hua Enhancer 1 methylase (HEN1), which provides protection against degradation after export from the nucleus to the cytoplasm (Yang et al., 2006;Plotnikova et al., 2013;Baranauske et al., 2015).
In the last step of miRNA biogenesis, the miRNA/miRNA* duplex is incorporated into an Argonaute 1 (AGO1) miRISC. The miRNA* is then removed, leaving only the miRNA, which is stabilized within the miRISC complex (Vaucheret et al., 2004;Baumberger and Baulcombe, 2005;Yu et al., 2017;Tomassi et al., 2020). For a long time, it was hypothesized that the methylated miRNA/miRNA* duplexes are exported to Fig. 1. miRNA biogenesis in plants. RNA polymerase II (RNA Pol II) transcribes miRNA genes. Primary transcripts (pri-miRNAs) contain a cap at their 5ʹ end and are polyadenylated at their 3ʹ end. The activity of the DCL1 endonuclease first cuts off the imperfectly folded stem-and-loop structure of pri-miRNAs, resulting in a shorter stem-loop hairpin (pre-miRNA). This reaction entails the concerted action and physical interactions of SE, HYL1, DCL1, and CBC. Throughout the process, several auxiliary proteins participate in the interactions with the microprocessor core, among others TGH, DDL, and CHR2. The resulting pre-miRNAs are further excised by DCL1 to mature miRNA/miRNA* duplexes. Next, the 3ʹ-ends of miRNA/miRNA* duplexes are methylated by HEN1. The guide miRNA strand is then integrated into AGO1 protein with the aid of CRM1 protein and exported to the cytoplasm, where it regulates the cognate mRNA level. the cytoplasm by the exportin Hasty (HST) (Park et al., 2005). Interestingly, recent studies show that the miRISC complex is assembled in the nucleus and then exported to the cytosol in a Chromosomal Maintenance 1 (CRM1; also named EXPORTIN1)/nuclear-export-dependent manner ( Fig. 1) (Bologna et al., 2018;B. Zhang et al., 2020). In support of this model, a recent study by Cambiagno et al. (2021) reported that HST is associated with the formation of the miRNA biogenesis complex for a specific subset of MIR genes. This permits the promotion of pri-miRNA transcription and processing rather than the direct export of processed miRNAs from the nucleus. In addition, HST has also been shown to be autonomously required for facilitating both cell-to-cell and phloembased long-distance movement of fully processed miRNAs (Brioudes et al., 2021). Thus, based on the latest data, AGO1 is the main protein responsible for miRNA export from the nucleus to the cytoplasm.
During the past decade, the development of high-throughput sequencing has enabled the identification of numerous miR-NAs sourced from various non-model plant species, from red and green algae to seed plants (Molnár et al., 2007;Amborella Genome Project, 2013;Alaba et al., 2015;Xia et al., 2015;Valli et al., 2016;Cock et al., 2017;Noronha Fernandes-Brum et al., 2017;Ortiz et al., 2019). Interestingly, among the data present in the literature and in available land plant sequence databases, miRNA repertoire characterization has been performed for only four bryophyte species, comprising one hornwort, two liverworts, and one moss (Khraiwesh et al., 2010;Arazi, 2012;Alaba et al., 2015;Lin et al., 2016;Tsuzuki et al., 2016;. This contrasts with the data available for 75 angiosperm species, as registered in miRBase version 22 (Kozomara et al., 2019).
According to the available data, the main molecular framework responsible for the biogenesis, degradation, and mode of action of miRNAs was likely already established in the common ancestor of all land plants; this is because major components of the miRNA pathway have been identified in all studied bryophytes (Arif et al., 2012(Arif et al., , 2022Lin et al., 2016;Tsuzuki et al., 2016;You et al., 2017). However, the almost complete lack of mechanistic studies on miRNA biogenesis in bryophytes impedes our understanding of miRNA production in these plants.
While the precise phylogenetic relationships among the bryophytes remain unresolved, the critical position of liverworts-one of the earliest land plant lineages-is unequivocal (Puttick et al., 2018). Therefore, liverworts represent a key group that comparative genomics can use to resolve fundamental questions in plant evolution. In this review, we summarize findings from studies of miRNAs and genes related to miRNA biogenesis from representatives of two liverwort lineages: Pellia endiviifolia, with simple thallus morphology (simple thalloid liverwort), and Marchantia polymorpha, which shows functional specialization within specific thallus sections/ parts (complex thalloid liverwort). Furthermore, we present an analysis of M. polymorpha MIR gene structures based on the available genomic and transcriptomic data. We also provide new insights regarding the conservation of miRNA biogenesis machinery by focusing on both the core processing machinery components that have been described in M. polymorpha and by identifying the auxiliary and regulatory proteins influencing the folding, stability, and/or processing of pri-miRNAs.

Identification of conserved land plant miRNAs present in liverworts
The first report describing the liverwort miRNA repertoire came from P. endiviifolia, in which combined analyses of the transcriptome, small RNAs (sRNAs), and degradome provided experimental evidence for target mRNA turnover by identified conserved and novel miRNAs. In total, 486 miRNAs belonging to 345 families (311 conserved and 34 novel families) were identified; these included numerous vascular-plant-specific miRNAs, of which 10 conserved miRNA families were found to be common to all land plants (miR156, miR160, miR166, miR171, miR319/159, miR390, miR395, miR396, miR408, and miR535) .
Small RNA profiling was also used to identify miRNAs and their precursors by comparing the sRNA sequencing data with genomic DNA and transcriptome profiles in M. polymorpha. Based on the obtained data, 265 miRNAs were identified and were mapped to 124 genomic loci (Tsuzuki et al., 2016;Bowman et al., 2017). In M. polymorpha, only seven miRNAs (miR160, miR166, miR171, miR319/miR159, miR390, miR408, and miR529) are conserved with those of other land plants (Tsuzuki et al., 2016;Bowman et al., 2017).
As liverworts belong to the bryophytes, we analyzed conserved miRNAs within this group (Fig. 2). Only six conserved miRNA families (miR319/159, miR160, miR166/165, miR171/170, miR408, and miR536) are shared by the hornwort Anthoceros angustus, the liverworts P. endiviifolia and M. polymorpha, and the moss Physcomitrium patens. Moreover, an additional three conserved miRNA families are shared between P. endiviifolia, Physcomitrium, and Anthoceros (miR156/157, miR477, and miR535), and three miRNA families (miR390, miR529, and miR1030) are shared by M. polymorpha, P. endiviifolia, and Physcomitrium Tsuzuki et al., 2016;Bowman et al., 2017;Kozomara et al., 2019;Li et al., 2020;. Our analysis suggests that a specific set of six conserved miRNAs common to all bryophytes and higher plants studied so far probably arose before the emergence of a common ancestor of all land plants (Deng et al., 2017;. Six miRNA families that are highly conserved between these two groups also target mostly conserved mRNAs (miR536 seems to target different mRNAs in bryophytes and higher plants). This implies that they play similar and important roles in functional regulatory networks that coordinate plant development in all land plants (Tsuzuki et al., 2016;Xia et al., 2016;Lin and Bowman, 2018;Guo et al., 2020).
Liverwort-specific miRNAs shared by M. polymorpha and P. endiviifolia An earlier comparison of the set of miRNAs found in the two liverwort species M. polymorpha and P. endiviifolia revealed no overlap in non-conserved miRNAs, with one exception. This was the single P. endiviifolia-specific Pen-miR8163, which shares a similar sequence with the M. polymorphaspecific Mpo-miR11737a. miR8163 in P. endiviifolia seems to represent a single-member family, whereas in M. polymorpha the Mpo-miR11737 family consists of two members, Mpo-miR11737a and Mpo-miR11737b ( Fig. 3A) (Tsuzuki et al., 2016). Next-generation sequencing results suggest that Mpo-miR11737a is more abundant than Mpo-miR11737b. Furthermore, these data also show that Mpo-miR11737a is up-regulated in male vegetative thalli (Tsuzuki et al., 2016). However, the detailed expression profile of its P. endiviifolia ortholog, Pen-miR8163, is not known (Zielezinski et al., 2015).
Analysis of the M. polymorpha genome (MarpolBase database) revealed the presence of a short RNA sequence similar to Pen-miR8170. Interestingly, this sRNA is encoded within the precursor of the M. polymorpha-specific Mpo-miR11865, and thus represents the miRNA* of Mpo-miR11865 ( Fig. 3B) (Tsuzuki et al., 2016;Bowman et al., 2017). Moreover, it has been found that the expression of Mpo-miR11865 is relatively low in M. polymorpha, and its miRNA* was not annotated as the miRNA* of Mpo-miR11865. Thus, it cannot be excluded that the miRNA* of Mpo-miR11865 also acts as a miRNA, as it does in P. endiviifolia.
We also identified Mpo-miR11889, which is a homolog of Pen-miR8185 (Fig. 3C). Interestingly, Mpo-miR11889 is strongly up-regulated in M. polymorpha antheridiophores, similarly to Pen-miR8185, which is up-regulated in male generative thalli (Zielezinski et al., 2015;Tsuzuki et al., 2016). Careful inspection of microtranscriptomic data for M. polymorpha and P. endiviifolia thus has revealed the presence of three miRNAs common to the genera Marchantia and Pellia. Given that Pellia and Marchantia shared a last common ancestor ~350 million years ago (i.e. in the Paleozoic era), the presence of miRNAs common to both lineages suggests that they play important roles in liverwort physiology. Moreover, finding only three miRNAs common to P. endiviifolia and M. polymorpha supports a model of frequent gain and loss of MIR genes during evolution, as has been previously suggested (Fahlgren et al., 2007;Axtell et al., 2007).

The function of liverwort miRNAs
In recent years, significant progress has been made in characterizing the biological roles of key miRNAs in various plant species. To fully understand the mode of action of miRNAs, it is imperative to define miRNA targets. In the first part of this section, we sum up what is known about miRNA-target modules when comparing liverworts with other land plants, and in the second part we focus on the effects of miRNAs on the development and physiology of M. polymorpha.

Evolutionary conservation of miRNAs and their targets in liverworts
Analyses of miRNAs and their targets in P. endiviifolia and M. polymorpha revealed conserved as well as fluid miRNAtarget relationships. In both liverwort species three conserved miRNAs, miR160, miR166, and miR408, have been demonstrated to target orthologous genes from the Auxin Response Factor (ARF), Class III Homeodomain-Leucine Zipper (HD-ZIP III), and plantacyanin families, respectively (Floyd and Bowman, 2004;Alaba et al., 2015;Lin et al., 2016). Similarly to angiosperms, heavy metal ATP synthase protein 8 (ATPase 8) and ppa004802m targeted by miR408 were also found as conserved targets in P. endiviifolia . In M. polymorpha, the conserved miRNAs miR171, miR319, and miR529 were shown to guide mRNA cleavage of the same target gene families as found in other land plants; these gene families included Gibberellic Repressor Scarecrow (GRAS), Transcriptional activator Myb (MYB), and Squamosa Promoter Binding Protein-Like (SPL) transcription factors, respectively (Lin et al., 2016;Tsuzuki et al., 2016;Flores-Sandoval et al., 2018a;Lin and Bowman, 2018). The identified examples of conserved miRNA-target nodes in liverworts support previous findings indicating that the evolutionarily conserved miRNA-target pairs are crucial for the regulation of ancestral transcription factors or physiological enzymes involved in basic plant development or tolerance to stresses Axtell and Meyers, 2018).
A large number of target mRNAs identified in liverworts appear to be unique to each of the species studied, a phenomenon also observed in other land plants. Still, many of these non-conserved miRNA-target dependencies may play important roles for proper P. endiviifolia or M. polymorpha growth and development, as newly identified targets mostly encode nucleic acid-binding proteins and proteins with various catalytic activities Lin et al., 2016;Tsuzuki et al., 2016). Several M. polymorpha transcription factors whose orthologs in higher plants regulate developmental processes (including MADS-box, homeodomain, and zinc finger transcription factors) have been found to be under the control of M. polymorpha-specific miRNAs (Mpo-miR11681, -miR11687, -miR11693, and -miR11677). Furthermore, M. polymorpha orthologs of angiosperm genes known to be involved in the development of the meristem (Barely Any Meristem 2; AtBAM2) and the epidermal layer of the seed (Defective Kernel 1; AtDEK1) were also identified as targets of the M. polymorpha-specific miRNAs Mpo-miR11687 and Mpo-miR11677, respectively (Lin et al., 2016). However, the Arabidopsis thaliana transcripts AtBAM2 and AtDEK1 are known to not be under miRNA control. Thus, the role of M. polymorpha BAM and DEK1 ortholog proteins is an interesting question that has yet to be answered. Efforts to understand the M. polymorphaspecific miRNA-target interactions will help to elucidate how these species-specific miRNAs affect specific growth processes in M. polymorpha. Moreover, these data indicate that during the evolution of liverworts, miRNA-mediated regulation of gene expression was implemented to control diverse developmental processes.
Interestingly, degradome data have shown that two novel miRNAs, Mpo-miR11707.1 and Mpo-miR11707.2, are produced from a single precursor and target the MpAGO1 transcript (Lin et al., 2016). Similarly, in the moss P. patens miR904 targets three PpAGO1 homologs (PpAGO1a-c), whereas in A. thaliana the single AtAGO1 mRNA is targeted by miR168 (Floyd and Bowman, 2004;Lin et al., 2016;Tsuzuki et al., 2016;Flores-Sandoval et al., 2018a;Lin and Bowman, 2018). Taken together, these data suggest that miRNA-mediated feedback FPO regulation of the miRNA biogenesis pathway evolved independently in the bryophyte and angiosperm lineages, with different miRNAs regulating the same target, namely AGO1 mRNA. These data underline how important it is to control the miRNA level in plants and that AGO1 feedback regulation plays a crucial role in this process.
The available literature shows that only a small number of the evolutionarily conserved miRNA-target modules have been identified in bryophytes and higher plants. Nonetheless, the majority of liverwort-identified miRNAs appear to target transcription factor mRNAs that are involved in various developmental processes, signaling, and stress response pathways, just as they do in angiosperms. Encouragingly, growing evidence shows that non-conserved miRNA-target nodes are also regulators of gene expression and function as crucial determinants of plant growth and development.

Physiological relevance of the miRNA action in M. polymorpha development
To date, only a few functional studies of miRNA action have been published for liverworts. One of the first studies was by Tsuzuki et al. (2016), who demonstrated the effects of ectopic miRNA expression on M. polymorpha development. Plants overexpressing miR166a and miR319b exhibited clear morphological phenotypic changes, namely curling of the thalli to the ventral or dorsal side, respectively. In addition, the ectopic expression of either miRNA triggered the absence of gemma cups, although gemmae were produced in miR319b-overexpressing plants. While miR166a overexpression reduced the transcript levels of the predicted target gene, MpC3HDZ, this MpMYB-like transcript was not affected by miR319b overexpression. In addition, rapid amplification of cDNA ends (RACE) experiments have shown that MpMYB-like mRNA is cleaved at the predicted miR319b cleavage site in M. polymorpha (Tsuzuki et al., 2016). The authors proposed two possible explanations: (i) miR319b repression of MpMYB33-like gene transcript may occur mainly at the translational level, like APETALA2 repression by miR172 in Arabidopsis (Aukerman and Sakai, 2003;Tsuzuki et al., 2016); or (ii) another, as yet uncharacterized, miR319b target is implicated in this regulatory mode of action during M. polymorpha development. Overall, this study revealed that miR166a and miR319b play important roles in the proper horizontal expansion of thalli and in the differentiation of gemma cups in M. polymorpha (Tsuzuki et al., 2016).
In another study, it was shown that MpARF3, encoding class C Auxin Response Factor, is the primary gene regulated by miR160. Both loss-and gain-of-function MpARF3 alleles resulted in pleiotropic defects manifested throughout the gametophytic stages of the life cycle. This indicated that MpARF3 is a negative regulator of differentiation and promoter of cell proliferation. Mparf3 knockout mutants exhibited smaller cell size, more differentiated air pores, the production of a greater number of pegged rhizoids, and more gametophores per thallus area, with ectopic antheridia and fused archegoniophores. Conversely, Mpmir160 knockout mutants seemed to lack proper differentiation of air pores, gemma cups, or gametophores under inductive conditions and produced more branched and convoluted thalli. These observations indicate that the MpARF3 protein and miR160 play roles as specific modulators of the cell differentiation rate and totipotency in M. polymorpha (Flores-Sandoval et al., 2018a). Next, a differential gene expression analysis of Mparf3/Mpmir160 mutant plant transcriptomes by Flores-Sandoval et al. (2018b) identified a set of genes that were genuinely dependent on the MpARF3/MpMIR160 regulatory module. Of these, two genes belonging to the SPL transcription factor family were identified as repressed by MpARF3. At the same time, two miRNA precursors in which Mpo-miR11671 and Mpo-miR529c are embedded and which target MpSPL1 and MpSPL2 gene transcripts, respectively, were found to be activated by MpARF3. In angiosperms, it is known that SPL proteins are involved in controlling the transition from vegetative to reproductive life stages. By combining their results with data from the literature, it was pointed out that MpARF3 may antagonize the reproductive transition during the life cycle of M. polymorpha. This hypothesis seems plausible, given that the specific expression pattern of both MpSPL genes has been observed along with an explicit expression peak in gametangiophores and a simultaneous down-regulation of both miRNA precursors at this developmental stage (Flores-Sandoval et al., 2018b). Indeed, Tsuzuki et al. (2019) showed that miRNA529c regulates the reproductive transition in M. polymorpha by repressing MpSPL2 gene expression during vegetative growth, thereby impeding the appearance of reproductive branches and reproductive organs. Analysis of Mpspl2 knockout mutant lines suggests that MpSPL2 may also play a role in the promotion of the reproductive transition and is needed for the proper development of reproductive branches. However, the MpSPL2 gene is not essential for gamete differentiation or gamete function, as crosses between male and female Mpspl2 plants resulted in viable spore production (Tsuzuki et al., 2019). Thus, the miR156/529-SPL module seems to control the vegetativeto-reproductive phase change in bryophyte development, as it does in angiosperms. However, further investigation of the MpSPL1/Mpo-MIR11671 regulatory module is necessary.
Bryophyte rhizoids have been found to be morphologically similar to vascular plant root hairs (Honkanen et al., 2018). Basic helix-loop-helix (bHLH) transcription factors encoded by Root Hair Defective Six-like (RSL) class I genes have been shown to be positive regulators of both root hair development in vascular plants and rhizoid development in non-vascular plants (Menand et al., 2007;Proust et al., 2016;Kim et al., 2017). Interestingly, Honkanen et al. (2018) identified a negative regulator of the M. polymorpha MpRSL1 gene that encodes a novel miRNA, MpFew Rhizoids1 (MpFRH1) (previously described as Mpo-miR11861 by Tsuzuki et al., 2016). A comparison of MpFRH1 gain-of-function and loss-of-function mutant plant phenotypes revealed decreases and increases, respectively, in the mRNA levels of the predicted target gene, MpRSL1. In addition, RACE experiments also showed that the MpRSL1 transcript is cleaved within the MpFRH1 miRNA recognition site. Although RSL class I genes in Arabidopsis are negatively regulated by the homeodomain protein Glabra2 (AtGL2), there is no known miRNA involved in the regulation of AtRSL in Arabidopsis (Proust et al., 2016;Honkanen et al., 2018). Therefore, the discovery of a new species-specific miRNA in M. polymorpha revealed a different and independent mechanism of repression of this RSL class I gene despite the fact that its function is conserved compared with other land plants (Honkanen et al., 2018).
Taken together, the studies described here revealed the first insights into the regulatory role played by miRNAs in liverworts. However, despite increasing interest in studies of the microtranscriptomes of bryophytes in the recent past, new research is required to elucidate both the compositional and the functional implications of the miRNA-target modules in bryophytes.

M. polymorpha miRNAs are mainly encoded in protein-coding genes
Previous research found that M. polymorpha miRNAs are encoded either between or within protein-coding genes (Lin et al., 2016;Bowman et al., 2017). We therefore examined various transcriptomic datasets covering both miRNA precursors and mature miRNAs to analyse their coding position. Sequence annotations for both the precursor and mature miRNAs of M. polymorpha were retrieved from the GFF annotation file of the MpTak v6.1 reference genome assembly (Montgomery et al., 2020). These annotations covered 256 pre-miRNAs and 319 miRNAs. We used BEDTools to retrieve a list of 107 pre-miRNAs whose nucleotide sequences did not overlap with other annotated genes and thus may represent independent transcription units (Quinlan and Hall, 2010). The other 149 pre-miRNAs overlapped with protein-coding genes (Table S1 at Zenodo; Pietrykowska et al., 2022). From this sample, 95 pre-miRNAs were found to reside in genes encoding proteins that are orthologs of known proteins or that possess known protein motifs. These 95 pre-miRNAs were found in 5ʹ or 3ʹ untranslated regions (n=36), in the coding sequence (n=5), within introns (n=51), and at the exon-intron boundary (n=3). The remaining 54 pre-miRNAs were annotated within the coding sequences of genes possessing annotated open reading frames; the majority of these were shorter than 150 amino acids in length and had no similarity to known proteins or protein motifs.
To learn more about the 107 pre-miRNAs that could have been encoded by independent transcriptional units, we compared transcriptomic data from MarpolBase covering each pre-miRNA from the 107 miRNAs and analyzed all available transcriptomic read coverage along the DNA sequences in which the pre-miRNAs were embedded. We combined RNAseq and cap analysis gene expression sequencing (CAGE-seq) data to develop criteria for determining the transcription start site; the 3ʹend of the gene was defined as the last RNA-seq read aligned to the genomic sequence (Montgomery et al., 2020). Our approach enabled us to identify gene structures for 24 pre-miRNAs representing 23 independent transcriptional units (Table 1); these comprised 15 intronless MIR genes, one polycistronic locus comprising two pre-miRNAs (Mpo-pre-miR390 and Mpo-pre-miR11676), and seven MIR genes containing a single intron. In the intron-containing genes, the miRNA stem-loop structure is predominantly located in the first exon, with a few notable exceptions. These include one gene (Mpo-MIR11685) in which the pre-miRNA is located within the second exon, and two genes (Mpo-MIR11773 and Mpo-MIR11815) in which the precursor hairpin sequence is located within the intron. With regard to Mpo-MIR11678, two alternative pri-miRNA structures, resulting from alternative splicing, were identified. The longer Mpo-pri-miR11678.1 is over 2500 nt in length and was proposed based on RNA-seq reads profiled from antheridiophores; it is characterized by the presence of a 1108 nt long intron. The second pri-miRNA, Mpo-primiR11678.2, is shorter, containing a 125 nt long intron; this structure was identified based on the RNA-seq expression profiles of archegoniophores and heat-stress-treated plants. However, in both cases, the location of the pre-miRNA was the same, with both being located within the first exon. Since the level of Mpo-miR11678 is similar in both antheridiophores and archegoniophores, it is rather unlikely that the observed alternative splicing events affect the maturation efficiency of this miRNA (Tsuzuki et al., 2016).
All introns detected in the proposed gene structures were of the classical U2 type. However, for more than 80 of the 107 analyzed pre-miRNAs we were not able to propose a gene structure. There are two reasons for this. First, in many cases there was insufficient RNA-seq coverage in the genomic region where the analyzed pre-miRNA was located. Second, in some cases the miRNA precursor was found on the opposite strand from the corresponding annotated M. polymorpha gene. Future studies are therefore needed to clarify the specific structures of many M. polymorpha MIR genes.
When data published by Bowman et al. (2017) from 265 M. polymorpha MIR genes are taken into consideration, our analysis reveals that the majority of M. polymorpha miRNAs are encoded within protein-coding genes; this situation contrasts with that in A. thaliana, where the majority of MIR genes are independent transcriptional units (Szarzynska et al., 2009;Bielewicz et al., 2012;Zielezinski et al., 2015). Unlike M. polymorpha, in P. endiviifolia, the characterized MIR genes represent independent transcriptional units. However, it should be taken into consideration that only 10 MIR gene structures are known for P. endiviifolia . This is also in contrast to  P. patens, where, like in M. polymorpha, the majority of miRNA hairpins overlap with positions of annotated protein-coding loci (Axtell et al., 2007). Moreover, our analysis identified one polycistronic locus in M. polymorpha, while in P. patens 25% of the miRNAs are polycistronic (Axtell et al., 2007). Taking these data together, the organization of MIR genes in different bryophyte species differs considerably and may mirror the existence of additional regulatory layers that shape final miRNA levels.
Higher plant miRNA biogenesis proteins are encoded within the M. polymorpha genome For many years, researchers have focused investigations of genes and proteins from the microprocessor complex on angiosperms (Han et al., 2004;Kurihara and Watanabe, 2004;Park et al., 2005;Xie et al., 2005;Lobbes et al., 2006;Kurihara et al., 2006). By contrast, much less is known about the specific structure and composition of such genes in early terrestrial plants.
Genes encoding proteins found in the higher plant microprocessor core have previously been described in bryophytes. First, studies on the moss P. patens revealed the presence of all microprocessor core encoding genes, including PpDCL1, PpSE, and PpHYL1. Moreover, other important players in miRNA biogenesis, such as PpHEN1 and PpAGO1, were also found (Arif et al., 2013). The presence of several genes involved in miRNA biogenesis, including DCL1, SE, HEN1, and AGO1, was also reported in a representative of the most basal land plant lineage, the hornwort Folioceros fuciformis . The existence of a DCL1-encoding gene was also discovered in the recently published genome of another hornwort, A. angustus . Our search for microprocessor complex-encoding genes in liverworts belonging to the three classes Haplomitriopsida, Jungermanniopsida, and Marchantiopsida, performed using the 1000 Plants platform (One Thousand Plant Transcriptomes Initiative, 2019), identified transcripts of key microprocessor proteins in the vast majority of liverwort species. These included DCL1, HYL1, and SE, as well as transcripts of genes encoding early miRNA biogenesis proteins such as Tough (TGH), Dawdle (DDL), and HEN1 (Table S2 at Zenodo). Since a large number of proteins involved in plant miRNA biogenesis have been identified thus far, we decided to assign these proteins into four sections. The first section focuses on the basic core proteins of the microprocessor (DCL1, SE, and HYL1), the second on auxiliary proteins that directly interact with microprocessor core components, the third focuses on selected regulatory proteins that indirectly affect the efficiency of miRNA biogenesis, and the fourth section describes proteins involved in miRISC formation. To understand more about the main liverwort miRNA biogenesis proteins, as well as the auxiliary and regulatory proteins that fine-tune miRNA production in higher plants, we searched the M. polymorpha genome (the only sequenced and annotated representative liverwort genome available) for orthologs of A. thaliana proteins that have been identified as being related to miRNA biogenesis.
The core proteins of the microprocessor in M. polymorpha In the M. polymorpha genome, several genes encoding core microprocessor proteins involved in miRNA biogenesis have previously been identified. These include MpDCL1a, MpDCL1b, and MpHYL1 (Lin et al., 2016;Tsuzuki et al., 2016;Bowman et al., 2017;Lin and Bowman, 2018). The SE gene, which encodes one of the key components of the higher plant microprocessor, has thus far not been identified in the M. polymorpha genome. Our investigations revealed the existence of all genes that encode components of the plant core microprocessor, including MpSE. Table 2 presents all identified M. polymorpha core protein genes as well as other genes that encode proteins interacting with the core microprocessor proteins.

Selected microprocessor auxiliary proteins in M. polymorpha
Next, we searched the M. polymorpha genome for genes coding for auxiliary proteins that are known to affect miRNA biogenesis in higher plants by interacting with core microprocessor components (Table 2). Here, we identified MpTGH, which encodes an ortholog of Arabidopsis RNA-binding TGH protein. In A. thaliana, this protein is known to be involved in the early stages of miRNA biogenesis and to interact with DCL1 and mRNA adenosine methylase (MTA), which introduces m6A into mRNA and pri-miRNAs (Table 2). TGH is also known to bind to both pri-miRNAs and pre-miRNAs and is necessary for the effective interaction of pri-miRNA with HYL1 (Ren et al., 2012;Bhat et al., 2020). We also identified MpDDL, which encodes a conserved forkhead-associated (FHA) domain-containing protein that in Arabidopsis interacts directly with the DCL1 protein, thereby affecting pri-miRNA processing (S. . In A. thaliana, it has been shown that the Cell Division Cycle 5-like (CDC5), Protein Pleiotropic Regulatory Locus 1 (PRL1), and RNA helicase MAC7 proteins form a MOS4associated complex (MAC) that interacts with DCL1/SE, pri-miRNAs, and HYL1 (Zhang et al., 2013;Jia et al., 2017;Manavella et al., 2019). Here, CDC5 and PRL1 interactions modulate pri-miRNA levels, resulting in miRNA accumulation, and work in concert as a complex to increase DCL1 activity (Zhang et al., 2013(Zhang et al., , 2014. We identified genes encoding all MAC components in M. polymorpha, including MpCDC5, MpPRL1, and MpMAC7. We also found an ortholog of one of the intrinsically disordered proteins that in A. thaliana has been designated Constitutive Alterations in the small RNAs Pathways 9   Table 2. Continued (CARP9). We designated the gene identified in the M. polymorpha genome as MpCARP9. In Arabidopsis, the CARP9 protein is known as a novel miRNA pathway partner. It interacts with HYL1 and AGO1, leading to efficient miRNA loading on AGO1 (Tomassi et al., 2020). The presence of a M. polymorpha ortholog of the Negative On TATA Less 2 (NOT2) protein is another example of a microprocessor-interacting protein that is shared by A. thaliana and M. polymorpha. NOT2 operates as a general factor during miRNA biogenesis in angiosperms, where it promotes miRNA gene transcription and facilitates effective DCL1 recruitment through their direct interaction (Wang et al., 2013). Unlike A. thaliana, which possesses two paralog proteins, NOT2a (AT1G07705) and NOT2b (AT5G59710), in M. polymorpha only one gene encoding NOT2a (MpNOT2a) is present (Table 2).
In addition, the Nuclear cap-binding protein subunit 1 (CBP80) and Nuclear cap-binding protein subunit 2 (CBP20) proteins found in A. thaliana form a nuclear cap-binding complex (CBC) and are involved in pri-miRNA biogenesis and splicing control by directly interacting with the SE protein (Laubinger et al., 2008;Kim et al., 2008). In our analysis of the M. polymorpha genome, we identified genes that encode both CBC subunits (MpCBP20 and MpCBP80). Speth et al. (2013) reported that the Receptor of Activated Kinase (RACK1) may interact with SE, suggesting that RACK1 plays a role in A. thaliana miRNA processing. The RACK1 protein is known to contain beta-transducin (WD40) repeats and therefore is highly conserved among plant species (Ullah et al., 2008;Speth and Laubinger, 2014). In Arabidopsis, this gene has three paralogs known as RACK1A, RACK1B, and RACK1C. However, in the M. polymorpha genome we were able to identify only MpRACK1C.
The A. thaliana HEN1 protein was identified as an RNA methyltransferase that methylates the 3ʹ ends of the miRNA/ miRNA* duplex. This protein is also known to interact directly with DCL1 and HYL1 Yu et al., 2005;Baranauske et al., 2015). We identified an MpHEN1 proteincoding gene in the M. polymorpha genome (Lin et al., 2016;Tsuzuki et al., 2016;Bowman et al., 2017).
Finally, in M. polymoprha we identified Serrate Associated Protein 1 (MpSEAP1) gene which encodes an ortholog of A. thaliana SEAP1 protein. In Arabidopsis SEAP1 interacts with SE and affects the splicing and early biogenesis stages of a set of miRNAs (M. . We were unable to identify orthologs of some proteincoding genes that are known to affect miRNA biogenesis in Arabidopsis by directly interacting with microprocessor core elements. Of these, we did not find Chromatin remodelling factor 2 (CHR2, also known as Brahma, BRM), which encodes one of the two catalytic ATPase subunits of the chromatin remodeling complex. The CHR2 protein remodels the secondary structure of miRNA primary transcripts in A. thaliana and consequently down-regulates miRNA production (Wang et al.,   Orthofinder v. 2.5.4 using BLAST as the main sequence similarity search tool (Emms and Kelly, 2019). Pairwise alignments between orthologous protein sequences were calculated using the needle tool from the EMBOSS package (Rice et al., 2000). Protein domains were annotated using the PfamScan tool and the Pfam 34.0 database (Mistry et al., 2021).  Thouly et al., 2020). In A. thaliana, Cycling DOF factor 2 (CDF2) interacts with DCL1 and acts in the same pathway as miR156 or miR172 to control flowering (Sun et al., 2015). The absence of CDF2 in M. polymorpha may be linked to the absence of miR156 and miR172.

Selected microprocessor regulatory proteins in M. polymorpha
Here we report the identification of selected protein-coding genes known to affect miRNA biogenesis in higher plants that indirectly interact with microprocessor core components. It was recently found that the HST1 protein is involved in the early stages of miRNA biogenesis and participates in the cell-to-cell transfer of mature miRNAs in A. thaliana (Brioudes et al., 2021;Cambiagno et al., 2021). An orthologous HST1 gene is present in the M. polymorpha genome.
mRNA adenosine methylase (MTA) was identified as a general methyltransferase that introduces m6A to RNA Pol II transcripts in Arabidopsis (e.g. mRNA and pri-miRNA) (Zhong et al., 2008;Bhat et al., 2020). MTA deficiency results in the down-regulation of a set of miRNAs by causing inefficient recruitment of microprocessor components related to chromatin; this therefore affects pri-miRNA structure (Bhat et al., 2020). We identified a gene encoding MTA in the M. polymorpha genome (MpMTA). Moreover, we also identified all components of the m6A methyltransferase complex in M. polymorpha [MTA, N6-adenosine-methyltransferase non-catalytic subunit MTB (MTB), FKBP12-interacting protein of 37 kDa (FIP 37), Protein virilizer homolog (VIR), and E3 ubiquitinprotein ligase Hakai (HAKAI)].
Recently, it has been shown that different DEAD-box helicases play important roles as splicing factors to control miRNA biogenesis Hou et al., 2021). For instance, genes encoding Small 1 (SMA1) and RNA helicase 8 (RH8) have been identified as MpSMA1 and MpRH8 within the M. polymorpha genome. According to Q. , Arabidopsis RH8 (together with RH6 and RH12) controls the formation of dicing-bodies (D-bodies) by interacting with the SE protein . However, we did not identify RH6 and RH12 in the M. polymorpha genome. It would therefore be interesting to study whether the MpRH8 protein is implicated in the formation of D-bodies in M. polymorpha. Yan et al. (2017) reported that SNF1-Related Protein Kinase subfamily 2 (SnRK2) proteins are required for the proper accumulation of miRNAs in A. thaliana by regulating miRNA processing factors. Specifically, the authors showed that the function of the HYL1 protein in Arabidopsis depends on SnRK2 kinases when plants experience osmotic stress. The M. polymorpha genome contains two genes encoding SnRK2 proteins, MpSnRK2A and MpSnRK2B (Bowman et al., 2017). However, their putative role in miRNA biogenesis in M. polymorpha requires experimental validation.

Selected proteins forming the miRISC complex
The AGO1 protein as a component of the miRISC complex, guided by miRNA, cleaves target mRNAs at a specific site, namely between the 10th and 11th nucleotides of the miRNA sequence (counting from the 5ʹ end). It has previously been reported that the M. polymorpha genome encodes an MpAGO1 protein (Lin 2016;Tsuzuki et al., 2016;Lin and Bowman, 2018). Another important protein from the Argonaute family known from Arabidopsis studies is AGO10. This protein acts as a negative regulator of AGO1 transcript levels. Moreover, it specifically binds miR165/166 targeting HD-ZIP III transcription factor mRNAs. Its presence is known to be critical for proper shoot apical meristem formation (Zhu et al., 2011;Yu et al., 2017). The previously reported absence of AGO10 in M. polymorpha is especially intriguing due to the presence of miRNA165/166 in its genome (Lin et al., 2016;Tsuzuki et al., 2016).
In conclusion, the set of conserved miRNAs in terrestrial plants is limited in number. Moreover, while individual miR-NAs appear to be highly species specific, the proteins involved in the microprocessor, as well as those proteins that interact with it directly or indirectly, are very similar and likely originated very early during the evolution of land plants.

M. polymorpha orthologs of A. thaliana proteins involved in miRNA biogenesis exhibit the same domain architecture
We investigated the orthologous genes in M. polymorpha and Arabidopsis that encode proteins involved in miRNA biogenesis. Specifically, we looked for M. polymorpha sequences that are orthologous to the Arabidopsis microprocessor core proteins (n=3; DCL1, HYL1, and SE), auxiliary (n=45), and regulatory (n=19) proteins, as well as miRISC formation proteins (n=3; AGO1, AGO4, and AGO10). In total, out of 70 Arabidopsis proteins involved in miRNA biogenesis, we found 54 orthologs (77%) in M. polymorpha ( Table 2).
The three core microprocessor proteins (DCL1, HYL1, and SE) in M. polymorpha have the same sequential arrangement of domains along their protein sequence (i.e. domain architecture) as the corresponding Arabidopsis proteins. The genome of M. polymorpha was previously found to encode two DCL1 protein genes, MpDCL1a and MpDCL1b, orthologous to DCL1 from Arabidopsis (Lin et al., 2016;Tsuzuki et al., 2016;Bowman et al., 2017). However, our analysis revealed that AtDCL1 shares an almost two-fold higher sequence identity with MpDCL1b (62%) than with MpDCL1a (32%). Moreover, MpDCL1b and AtDCL1 proteins share the same domain architecture, whereas MpDCL1a protein lacks the C-terminal domain that binds double-stranded RNA (DND1_DSRM) (Table 2, Fig. 4). Although two other microprocessor core proteins, HYL1 and SE, are less conserved than DCL1 in terms of sequence identity with their Arabidopsis orthologs (28% and 47%, respectively), both the MpHYL1 and MpSE proteins have a common domain architecture with the corresponding AtHYL1 and AtSE proteins, implying functional similarity of the orthologs (Table 2). Notably, the low sequence identity between AtHYL1 and MpHYL1 is due to high sequence variability at the C-terminus of both proteins: AtHYL1 contains six tandem repeats of 28 amino acid motifs, a pattern that is not observed in MpHYL1. However, it should be noted that repetitive extension of AtHYL1 is a characteristic feature of proteins found only in Brassicaceae (Lu and Fedoroff, 2000).
Most of the microprocessor auxiliary proteins reported in Arabidopsis (n=45) are also present in M. polymorpha (n=34; 77%). The protein sequences of the microprocessor auxiliary orthologs are similar in length and their mean sequence identity is 51% (±17% standard deviation). The highest sequence identity between orthologs is observed for Protein Phosphatase X (PPX) (86%), XAP5 Circadian Timekeeper (XCT) (79%), and SNRK2.6 (76%) proteins, and the least conserved sequences include Sickle (SIC) (24%), CARP9 (25%), and the scaffold zinc-knuckle protein (ZCCHC8A) (25%). Although AtSIC and MpSIC lack recognizable protein domains, both proteins were predicted as orthologs, suggesting their functional coherence. Most auxiliary microprocessor orthologs between Arabidopsis and M. polymorpha preserve the same domain content and order; however, five of them-DCL4, HEN1, High Osmotic Stress Gene Expression 5 (HOS5), Elongator complex protein 2 (ELP2), and PRL1-show differences in their domain architecture (Fig. 4, Table 2). These differences include either small variations in the number of domain copies (e.g. five copies of the WD40 domain in AtPRL1 and four copies in MpPRL1) or changes in domain content. The alterations in domain content relate to the absence of one domain in one of the orthologs. Specifically, in the DCL4 and HEN1 proteins, one domain in one ortholog seems to be replaced with another functionally similar domain in the corresponding sequence (DEAD with ResIII and Methyltransf_31 with Methyltransf_21, respectively). Given that alterations in domain architecture can impact protein function, we cannot exclude the possibility that the proteins in M. polymorpha with different domain compositions have acquired or lost some functionality (Hegyi and Gerstein, 2001). Functional studies are needed to verify this hypothesis. As such, a recent paper investigating the functional comparison of HEN1 proteins from Arabidopsis and M. polymorpha has shown that MpHEN1 protein activity is comparable with AtHEN1, and both proteins show substrate specificity toward duplex sRNA. Consistently, MpHEN1 structure modeling based on the AtHEN1 crystal structure revealed conservation of the M. polymorpha protein architecture, with several critical motifs responsible for RNA binding and methylation being highly conserved (Sanobar et al., 2021).
Furthermore, most of the regulatory microprocessor proteins reported for Arabidopsis (n=19) are also present in M. polymorpha (n=14; 74%) ( Table 2). The level of sequence conservation of these proteins (sequence identity: 47% ±17%) is very similar to that observed for auxiliary microprocessor orthologs. In addition, all microprocessor regulatory orthologs preserve the same domain architecture. The prime example is Chromosome region maintenance 1 (Exportin-1) MpCRM1 (MpXPO1), which shows a very high sequence identity of 79% with both AtExportin1A (AtXPO1A) and AtCRM1B (AtXPO1B) proteins and also preserves the same arrangement of six different domains.
In the case of miRISC formation proteins (AGO1, AGO4, and AGO10), we found orthologs of AGO1 and AGO4 between Arabidopsis and M. polymorpha. At the protein sequence level, the AGO1 orthologs show higher identity (67%) than the AGO4 orthologs (40%). We also observed differences at the protein domain level: MpAGO1 lacks a recognizable N-terminal glycine-rich domain that is present in AtAGO1, while MpAGO4 lacks a Mid domain, which is part of the Piwi lobe of AtAGO4.
Finally, we were unable to identify M. polymorpha orthologs for 16 Arabidopsis proteins, including 10 auxiliary microprocessor proteins [e.g. CHR1, CDF, two serine/arginine-rich splicing factors (RS40/41)], five regulatory microprocessor proteins [e.g. HEN1 Suppressor1 (HESO1), Mitogen-Activated Protein Kinase 3 (MPK3), PPX2], and AGO10 protein belonging to miRISC formation proteins (Table 2). We suggest that either these proteins are not encoded by the M. polymorpha genome or their corresponding amino acid sequences lack sufficient similarity for them to be assigned as orthologs. However, as has been reported (Bowman et al., 2017), some of the Arabidopsis proteins represent multigene families that encompass more protein members than the corresponding proteins found in M. polymorpha. For example, the Arabidopsis genome encodes three RACK proteins, whereas M. polymorpha has only one RACK member. In addition, the RH gene family is more expanded in Arabidopsis (i.e. RH6, RH8, RH12, RH27, and RH42) than in M. polymorpha (RH8 and RH42).

Concluding remarks
Bryophytes are thought to have shared a common ancestor with higher plants, and miRNAs may have had important regulatory functions before their divergence. In this review, we have summarized the available data on MIR gene structures, miRNA biogenesis, and miRNA-mediated gene regulation analysis datasets. These demonstrate both significant similarities and important differences between Bryophyta and higher plants. Conserved miRNA families in M. polymorpha (and other Bryophyta species) are limited; most miRNA families appear to be highly species specific. In particular, we show that M. polymorpha and P. endiviifolia share only three miRNA families. Moreover, in contrast to the higher plants, in which MIR genes are usually found in intergenic regions, most identified pre-miRNAs in M. polymorpha overlap with protein-coding genes. By identifying gene structures representing independent transcriptional units, we found that M. polymorpha MIR genes are mostly intronless. As in angiosperms, the majority of liverwortidentified miRNAs appear to target transcription factors and therefore play a role in the corresponding regulatory pathway. With respect to miRNA biogenesis, the microprocessor core protein composition and its auxiliary and regulatory proteins in M. polymorpha are highly conserved. We also found a high degree of similarity between the domain architecture in M. polymorpha and in Arabidopsis. However, we also found important exceptions, where crucial players involved in miRNA biogenesis differ at selected domains (including the AGO1, HEN1, and DCL1a proteins). These observations require additional studies to determine the functional consequences of these differences. All data summarized in this review confirm the hypothesis that miRNA-mediated silencing machinery emerged during the early stages of plant evolution. Future studies on the microtranscriptomes of M. polymorpha and other Bryophyta are necessary to provide a better understanding of land plant evolution. Moreover, such studies may help answer questions about how land plants have adapted to the terrestrial environment.